Automatic Knowledge Acquisition for Creating Nuclear Juridical Lexica
نویسندگان
چکیده
Abstract Knowledge acquisition constitutes the bottleneck for the creation of legal expert systems. Our prototype CONCAT deals with the problem that the information is formalised to a certain degree by use of legal language. This task is performed in that the process of creating a selective thesaurus for a nuclear juridical lexicon is supported which can be used for automatic indexing and document classification. This selectivity is obtained by distinguishing between precise legal terms and words with fuzzy meanings. The resulting thesaurus can thus represent automatically the expert knowledge of a lawyer about legal terminology. Therefore, CONCAT can be used as a tool for the semi-automatic creation of a legal knowledge base for the automatic representation of the structure and the contents of the documents. As structuring techniques we applied cluster analysis and neural networks. The cluster algorithm creates satisfying results but turned out to be very sensitive as concerns the fine tuning of parameters. The neural network achieves comparable quality without the need for any adaptation.
منابع مشابه
ATOLL - A framework for the automatic induction of ontology lexica
There is a range of large knowledge bases, such as Freebase and DBpedia, as well as linked data sets available on the web, but they typically lack lexical information stating how the properties and classes they comprise are realized lexically. Often only one label is attached, if at all, thus lacking rich linguistic information, e.g. about morphological forms, syntactic arguments or possible le...
متن کاملMorphology Based Automatic Acquisition of Large-coverage Lexica
In this article, we introduce a new technique for constructing wide-coverage morphological lexica from large corpora and morphological knowledge, with an application to French. Basically, it relies on the idea that the existence of a hypothetical lemma can be guessed if several different words found in the corpus are best interpreted as morphological variants of this lemma. We first validated o...
متن کاملGeneration of multilingual ontology lexica with M-ATOLL: a corpus-based approach for the induction of ontology lexica
While there are many large knowledge bases (e.g. Freebase, Yago, DBpedia) as well as linked data sets available on the web, they typically lack lexical information stating how the properties and classes are realized lexically. If at all, typically only one label is attached to these properties, thus lacking any deeper syntactic information, e.g. about syntactic arguments and how these map to th...
متن کاملTraining of Lexica for Subword-Based Speech Recognisers
In this paper we present an automatic optimal baseform determination algorithm. Given a set of subword Hidden Markov Models (HMMs) and acoustic tokens of a speciic word, we apply the tree-trellis N-best search algorithm to nd the optimal baseforms (transcriptions) in the maximum likelihood sense. The proposed algorithm is used in an iterative manner, creating a series of lexica trained from the...
متن کاملFactors affecting the acquisition of expert tacit knowledge Case study: Delivery time in twin pregnancy
This paper discovers the necessary variables need for creating models for tacit knowledge acquisition, especially in medical care services. The case studied here, was knowledge of diagnosing and time of delivery in twin pregnancy with nuchal translucency screening. This paper covers the empirical work undertaken on semi-structured interview based on thematic analysis. With regard of theoretical...
متن کامل